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Abstract: Lattice QCD with chiral fermions are extremely computationally expensive, but on the other hand provides an 
accurate tool for studying the physics of strong interactions. Since the truncated overlap variant of domain wall fermions are 
equivalent to overlap fermions in four dimensions at any lattice spacing, in this paper we have used domain wall fermions for 
our simulations. The physical information of lattice QCD theory is contained in quark propagators. In practice computing 
quark propagator in lattice is an inversion problem of the Dirac operator matrix representing this quarks. In order to develop 
fast inversion algorithms we have used overlap solvers in two dimensions. Lattice QED theory with U(l) group symmetry in 
two dimensional space-times dimensions has always been a testing ground for algorithms. By the other side, motivated by our 
previews work that the two -grid algorithm converge faster than the standard iterative methods for overlap inversion but not 
for all quark masses, we thought to test this idea in less dimensions such as U(l) gauge theory. Our main objective of this 
paper it is to implement and develop the idea of a two level algorithm in a new algorithm coded in QCDLAB. This 
implementation is presented in the preconditioned GMRESR algorithm, as our new contribution in QCDLAB package. The 
preconditioned part of our algorithm, different from the one of [18], is the approximation of the overlap operator with the 
truncated overlap operator with finite N3 dimension. We have tested it for 100 statistically independent configurations on 32 
x 32 lattice background U(l) field at coupling constant P=l and for different bare quark masses mq = [0.5, 0.45, 0.4, 0.35, 0.3, 
0.25, 0.2, 0.15, 0.1]. We have compared the convergence history of the preconditioned GMRESR residual norm with another 
overlap inverter of QCDLAB as an optimal one, such as SHUMR. We have shown that our algorithm converges faster than 
SHUMR for different quark masses. Also, we have demonstrated that it saves more time for light quarks compared to 
SHUMR algorithm. Our algorithm is approximately independent from the quark mass. This is a key result in simulations 
with chiral fermions in lattice theories. By the other side, if we compare the results of [18] for quark mass 0.1 in SU(3), results 
that our chosen preconditioned saves a factor of 2 but in U(l). Our next step is to test this algorithm in SU(3) and to adopt it in 
parallel. 
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1. Introduction 

Quantum Chromodynamics (QCD) is the quantum theory 
of interacting quarks and gluons. It should explain the 
physics of strong force from low to high energies. Due to 
asymptotic freedom of quarks at high energies [1], it is 
possible to carry out perturbative calculations in QCD and 
thus succeeding in explaining a range of phenomena. At low 
energies quarks are confined within hadrons [1] and the 
coupling between them is strong. This requires non- 
perturbative calculations. The direct approach is known to 
be the lattice approach. The lattice regularization of gauge 
theories was proposed by [Wilson 1974]. It defines the 
theory in an Euclidean 4-dimensional finite and regular 



lattice with periodic boundary conditions. 

Lattice regularization of chiral fermions is an important 
development of the theory of elementary particles. After 
many years of research in lattice QCD, it was possible to 
formulate QCD with chiral fermions on the lattice. There are 
two chiral formulations: a) domain wall fermions [2, 3] and 
b) overlap fermions [4, 5], which are closely related [6]. In 
particular, the truncated overlap variant of domain wall 
fermions [7] can be shown to be equivalent to overlap 
fermions in four dimensions at any lattice spacing [8]. 

The physical information of these theories is contained in 
quark propagators, which are then combined to construct 
meson, nucleon and other elementary particle propagators. 
One of the basic major computing problem in lattice QCD 
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simulations is the calculation of quark propagator. Generally, 
this problem leads to very intensive computations and 
requires high-end computing platforms. For this purpose, 
simulations of lattice theory with U(l) group symmetry (a 
unitary symmetry of one dimension that corresponds to a 
symmetry under transformations of phase), or the case of the 
Quantum Electrodynamics, in two dimensional space-times 
has always been a testing ground for algorithms. 

In this work we develop in fact prototypes of algorithms 
that are using high-end computing as our contribution to 
QCDLAB package [9, 10] for simulating chiral fermions. 
This package, a MATLAB/OCTAVE based environment, 
allows fast prototyping of linear algebraic computations and 
thus accelerates the process of finding the most efficient 
fermion algorithm. In practice computing quark propagator 
in lattice is an inversion problem of the Dirac operator 
matrix representing these quarks. Motivated by the results of 
the two grid algorithm that we proposed [11, 12] as the 
fastest overlap solver in SU(3) gauge theory, which didn't 
converges for all quark masses, we thought to test this idea 
in less dimensions such as U(l) gauge theory. In our case we 
want to calculate the domain wall fermion propagator, but in 
order to develop fast algorithms, we use the truncated 
overlap variant of domain wall fermions in 2+1 dimensions 
with the extra finite dimension N3. 

2. The Chiral Overlap Operator 

Quark propagator computations amount to solving large 
linear systems of the type: 

Dx = b (1) 

where D e C' Vx V is a sparse and large matrix operator 
representing the Dirac operator on a regular four 

dimensional space -time lattice, x,be C A are the quark 
propagator and it's source. In order to simulate chiral 
fermions on lattice QCD one can use as Dirac operator the 
chiral Dirac operator that is representing by the Neuberger 
operator, which is a shifted unitary matrix of the form [13] 

D N = Cl I-c 2 V, (2) 

where V = A(A + Ay 112 is a unitary matrice, / unit matrice 

and A = M-aD w . The overlap operator, D N it's 
non-hermitian and it is a dense and large matrice. This 
operator can be expressed in the equivalent way from 
signum function [13], 

D N =c 1 I-c 2 r 5 sign(H w ) (3) 

where H w = y 5 (M - aD w ) . For the signum function to be 

nontrivial, the Wilson-Dirac operator should be indefinite, 
which is the case if its bare mass M is sufficiently negative 
and it is usually taken to be in the interval (-2,0). a is the 
lattice parameter, c, and c 2 are two constants, 



Cl =(l+mJ/2, c 2 =(l-mJ/2 (4) 
m q is bare quark mass and D w the Wilson-Dirac operator, 

^=V2lkK +3 A.)- fl3 ^J (5) 

M 

with 3^ and 3^ as the the nearest-neighbor forward and 

backward difference operators. These operators are unitary 3 
by 3 matrices with determinant one that are associated with 
links of the lattice and are oriented positively. A set of such 
matrices forms a "configuration". y M ,jU = 1,...,5 are 4 by 4 

matrices related to the spin of the particle. Therefore, if there 
are N lattice points, the matrix is of order 12N. In the case of 
U(l) lattice gauge theory the complexity of this matrix it is 

2N. 

3. Fast Inverting Algorithms 

In this paper we use a special package software named 
QCDLAB [9, 10] which is a design and research tool for 
lattice QCD algorithms. It is a collection of MATLAB 
functions,that is based on a "small-code" and a 
"minutes-run-time" algorithmic design philosophy. In this 
work we use the QCDLAB 1.0 version [9] with the 
Schwinger model on the lattice, a great simplification, which 
shares many features and algorithms with lattice QCD. In 
QCDLAB 1.0, the overlap operator L> can be computed 
from the inverse square root of ^ A : 

(A* : Af 1/2 =V1T X V* (6) 

* 

where Z are the singular values of A = VLV . Since we 
require only the quark propagator of overlap chiral fermions, 
the full matrix D N is not required. The multiplication of 
D N with a vector can be computed using Krylov subspace 
algorithms [14] such as the double pass Lanczos algorithm 
[15] and Zolotarev approximation [16] for the inverse square 
root of the Lanczos matrix T. The algorithm requires a lower 

bound lambda of the smallest eigenvalue of A A _ This idea 
is implemented in SHUMR (Shifted Unitary Minimal 
Residual) optimal algorithm which has been part of 
QCDLAB 1.0., and it is shown in the Appendix C. This 
algorithm is tested before also in case of SU(3) gauge theory 
and gives good results [17]. Based on the idea of a two level 
algorithm as proposed in [1 1, 12], in this work we develop a 
faster algorithm, the preconditioned GMRESR [18] 
(Generalized Minimal Residual Method - Recursive) 
algorithm developed it as part of QCDLAB 1 .0 package, in 
U(l) gauge field background. So in this way we have added 
in QCDLAB 1.0 new efficient routines. 

In reference [18] the preconditioner is built upon an 
inaccurate approximation to the sign function, while we use 
as the preconditioned part the approximation of the overlap 
operator with the truncated overlap operator with finite N3 
dimension. The preconditioned GMRESR algorithm uses 
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the truncated Overlap operator along N g , jfOV , not ALGORITHM: The preconditioned GMRESR (A, D TOV , b, c„ c 2 , tol) 



the real overlap operator as SHUMR, so maybe we lose the 
accuracy of the exact overlap operator but we gain from the 
low complexity of the truncated overlap version. In order to 
control the residual norm of this algorithm we use the true 
residual norm from the exact solution. In general for the 
solution of the linear system q x _ £ we investigate this 

condition in the £ -th step, 



Ib-Dx* 



-(b-Dx*t 



+ 



true 

residual 



residual 
gap 



computed 
residual 



(7) 



and develop a strategy to bound residual gap below required 
accuracy. If the number of iterations to reach the desired 
residual reduction is large, then there can be a considerable 
accumulation of the errors in the matrix-vector product in 
the residual gap. In practice, this might mean that the 
tolerance on the matrix-vectors has to be decreased. In 
essence the preconditioned GMRESR follows two levels; 
First level: The preconditioned part: Compute approximate 
solution for the linear system using the equivalent of the 
overlap operator, the truncated overlap operator in 2 + 1 
dimensions with an extra finite N 3 dimension using small 
accuracy tolO. In this case we have used the CGNE 
(Conjugate Gradients on Normal Equations) algorithm as an 
known optimal solver [19]. 

Second level: Find the true residual using the real overlap 
operator in N 3 infinite dimension, and use this residual to 
control the residual of the first step in each iteration. The 
Matlab/Octave code function for this second step we called 
MultOverlap.m and it is shown in the Appendix B. This 
cycle will continue still is reached a desired tolerance tol. 

The Matlab/Octave code function of the preconditioned 
GMRESR algorithm is shown in the Appendix A. Below we 
are giving briefly the preconditioned GMRESR algorithm 
for inverting truncated overlap operator in 2+1 dimension, 
so in U(l) gauge theory. 

ALGORITHM: The preconditioned GMRESR (A, D TOV , b, c,, c 2 , tol) 

#computes X with VDx — Z>|| < tol ■ ||fo|| 

# cl, c2 the coefficients of the overlap operator 

# inital values 

x = 0; 
r = b; 

#empty matrix 

c=D; 

#outer iteration, the preconditioned part 

while ||r|| > tol ■ \b\ do 

solve D T0V U = r , to relative accuracy tolO , 



from u = cgne(D , r,to/0); 

#inner iteration, compute the true residual from c with 
#||D A ' M -c||<to/-||&||-|| M ||/||r|| 

C = Cj -W + C 2 -{V -w) , where V-u is computed from 
V ■ u - Mult _ Overlap(A, u); 
for i = \:(C,T)do 

a = C[:,if -c; 
c = c-a-C[:,i]; 
u = u -a-U[:,i]; 

end for 

# calculate 

c = c/||c||; u = ul\ c I; 

C = [C,c]; U = [U,u\, 

y—c^-r; x = x+y-u; r = r-y-c; 

end while 



4. Results and Discussion 

We have made simulation for 100 statistically 
independent configurations on 32 x 32 lattice background 
U(l) field at coupling constant ft = 1 and for different bare 

quark masses m q = [0.5, 0.45, 0.4, 0.35, 0.3, 0.25, 0.2, 
0.15, 0.1]. The third dimension for truncated overlap 

fermions we set is = 8 and to/0 = 0.1;to/ = 10~ 6 . The 
range parameter M we have fixed at -0.345 and we have 
taken the lower bound lambda = 10" 7 which requires the 
Zolotarev rational polynomial to be of the order n = 60, i.e. the 
number of T inversions is thus n/2 = 30. We compare the 
convergence history of the preconditioned GMRESR residual 
norm with SHUMR. The graphical results of the history of the 
convergence for each algorithm shows that our algorithm 
converges faster than SHUMR for different quark masses 
(Fig. 1 , Fig.2, Fig.3). As we can see from Fig. 1 , for quark mass 
0.5 our algorithm saves a factor of 2, from Fig.2 for quark 
mass 0.3 it saves a factor of 3 and from Fig.3 for quark mass 
0. 1 it saves a factor of 7. So, what is more important is the fact 
that our algorithm is faster and we gain more time for light 
quarks. Also, the Fig.3 shows that our algorithm in 
two-dimension saves a factor of 2 compare to the results of 
[18] for the same quark mass but in SU(3) gauge theory. This 
result must be checked for our algorithm in four dimension in 
our future work. We have calculated also the inverting time 
required till the convergence for each algorithm for different 
quark masses that is shown in Fig. 4. 

5. Conclusions 

The chiral symmetry in lattice QCD is crucial because this 
property is fundamental for the strong interactions. 
Simulations in lattice with chiral fermions have high 
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computation cost because of the complex form of operator 
that is related to them, the overlap operator. We showed that 
our algorithm, the preconditioned GMRESR, converges faster 
compared to an optimal solver of overlap operator, such as 
SHUMR, for different quark masses. Also, we showed that it 
saves much more time for light quarks compared to other 
algorithm and that it is approximately independent from the 
quark mass. In this way our algorithm avoid what it is called 
critical slowing down of the algorithms with light quarks. 
This is a key result in simulations with chiral fermions in 
lattice theories. From the results obtained seems that our 
algorithm is very promising and in our further studies we want 
to adopt this algorithm in parallel. 
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Figure 1. The history of the convergence of the norm of residual as the 
function of the number of D w multiplications for SHUMR and the 
preconditioned GMRESR inverting overlap operator on 32 x 32 lattice 
background U(l) field at coupling constant P = \ and quark mass 0.5(in 
lattice unit). 
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Figure 2. The history of the convergence of the norm of residual as the 
function of the number of DW multiplications for SHUMR and the 
preconditioned GMRESR inverting overlap operator on 32 x 32 lattice 
background U(l) field at coupling constant j5 = \and quark mass 0.3(in 
lattice unit). 
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Figure 3. The history of the convergence of the norm of residual as the 
function of the number of DW multiplications for SHUMR and the 
preconditioned GMRESR inverting overlap operator on 32 x 32 lattice 
background U(l) field at coupling constant fi = \ and quark mass 0.1(in 
lattice unit). 




0 I 1 1 1 1 1 1 j 

0.1 0.15 0.2 0.25 0.3 0,33 0.4 0.45 0,5 
bare ciuarkmass 

Figure 4. The inverting time of the overlap operator as function of the 
quark mass (in lattice unit) for SHUMR and the preconditioned GMRESR 
algorithms on 32 x 32 lattice background U(l) field at coupling 
constant ft = 1 . 

Appendix A 

The preconditioned GMRESR code function for inverting 
truncated overlap operator in U(l) gauge theory. 



function[x,Matvec,rr]=gmresr(A,psi,lambda,cl,c2, 
M,Ml,P,b,tolO,tol,nmax) 

n=max(size(b)); N3=max(size(P))/n; matvec=0; 

Matvec=matvec; 

k=l; b=b(:); r=b; x=zeros(n,l); 

rnorrrr=norm(r); rr=rnorm; U=[]; C=[]; 

while ((rnorm>tol)&&(k<=nmax)); 

% the preconditioned part, compute approximate solution 
using the approximate overlap operator the so called 
truncated overlap oeprator with accuracy tolO 

rl=zeros(n*N3,l); rl(l:n)=r; 

rl=Ml*(P*rl); 
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[ul,rrM]=cgne(M,rl,rl,tol0*morm,n*N3); 
multDw=2*N3*max(size(rrM)); 
u=P'*ul; u=u(l:n); 
% Find the true residual using the exact overlap operator 
[Vu,multA]=Mult_overlap(A,u,psi,lambda,n*N3); 
c=cl*u+c2*Vu; 
for i=l:k-l; 

alpha=C(:,i)'*c; c=c-alpha*C(:,i); u=u-alpha*U(:,i); 
end 

beta=norm(c); 

c=c/beta; u=u^eta; U=[U,u]; C=[C,c]; 

gamma=c'*r; 

x=x+u*gamma; 

r=r-c*gamma; 

rnorm=norm(r); rr=[rr;rnorm]; 

matvec=matvec+multDw+multA; 
Matvec=[Matvec,matvec] ; 

k=k+l; 
end 



Appendix B 

Mult Overlap code function using exact overlap operator 



function 

[Vx,Matvec,rr]=Mult_overlap(A,x,psi,lambda,nmax) 

% Given the Dirac operator A (shifted adequately) this 

function 

% multiplies x with the unitary part U*V of A=U*S*V 
% using Lanczos algorithm to compute inverse square root 
ofA'*A 

% and Zolotarev approximation for the inverse suqare root 
ofT 

% uses proejction of smallest eigenpairs (psi,lambda) of 
A'*A 

Matvec=0; 
AHA=A'*A; 

Qx=x-psi*(psi'*x); % projected right hand side 
[Tl,matvec,rr]=lanczos(AHA,Qx,psi,lambda,le-6,nmax,[],[ 

],0); 

Matvec=Matvec+matvec ; 

T=Tl;T(end,:)=[]; 

% compute l/sqrt(T)*el 

epsilon=max(real(diag(lambda))); % smallest eigenvalue of 
A'*A 

n=60; % order of Zolotarev approximation 

[d,q,y,Delta]=coef_zolotarev(l,n,epsilon); 

nlanczos=max(size(T)); 

e 1 =zeros(nlanczos, l);el ( 1)= 1 ; 

dCMd( 1 ) ;d( 1 H] ;n2=max(size(d)) ; 

inv_sqrt_T=dO*e 1 ; 

for 1=1 :n2; 

Tr=sparse(epsilon*speye(nlanczos)+T*q(l)); 
inv_sqrt_T=inv_sqrt_T+T*d(l)*(Tl\el); % n2 inversions 
ofTl 



end 

inv_sqrt_T=inv_sqrt_T *norm(x) ; 
% second call to lanczos 

[y,matvec]=lanczos(AHA,Qx,psi,lambda, 1 e-6,nmax,T 1 ,inv 
_sqrt_T,l); 

Matvec=Matvec+matvec ; 

inv_sqrt_lambda=diag(l./sqrt(real(diag(lambda)))); 
yl=y/sqrt(epsilon)+psi*(inv_sqrt_lambda*(psi'*x)); 
Vx=A*yl; 



Appendix C 

The SHUMR code function for inverting overlap operator 
in U(l) gauge theory. 

function[x,Matvec,rr]=SHUMR(A,psi,lambda,b,cl,c2,tol,i 
max); 

b=b(:); N=max(size(b)); vzero=zeros(N,l); x=vzero; 

r=b; matvec=0; Matvec=matvec; 

rho = norm(r); rnorm=rho; rr=rho; alpha = rho; 

ul2=0; beta=l; Lll tilde=l; q=r/rho; 

q_old=vzero; v_old=vzero; w_old=vzero; s_old=vzero; 

% start added lines 

c_kml=l; s_kml=0; c_km2=0; s_km2=0; 
pl=vzero; p2=vzero; Dpl=vzero; Dp2=vzero; 
% end added lines 
counter = 1 ; 

while ( (rnorm > tol) & (counter<=imax) ); 

[v,multDw]=Mult_overlap(A,q,psi,lambda,imax); 
if (counter > 1), 

ul2=-(q_old'*v)/(q_old'*v_old); 
end 

gamma=-cl*ul2; 

LI Hq'*v)+ul2*(q'*v_old); 

q_tilde=v-Ll 1 *q+ul2*v_old; 

L2 1 =norm(q_tilde) ; 

if (L21<=tol), break, end; 

w=q+q_old*u 1 2+w_old*gamma/L 1 ltilde; 
s=cl*(q+q_old*ul2)+c2*(v+v_old*ul2)+s_old*gamma/Ll 
ltilde; 

L 1 l_tilde=c 1 +c2*L 1 1 -beta*gamma/L 1 ljilde; 
alpha=alpha*beta/Ll ltilde; 
x=x+w*alpha; 
r=r-s*alpha; 
q_old=q; 
v_old=v; 
w_old=w; 
s_old=s; 
q=q_tilde/L21; 
beta=-c2*L21; 
rnorm=norm(r); 
% start added lines 
tll=cl+c2*Lll; 

mu=tl 1 *c_kml+gamma*conj(s_kml)*c_km2; 

nu=c2*L21; 

if(mu !=0), 
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c_k=abs(mu)/sqrt(abs(mu)*abs(mu)+abs(nu)*abs(nu)); 

s_k=conj (c_k*nu/ mu) ; 

else 

c_k=0; 

s_k=l; 

end 

omega=nu*alpha*s_k; 

mu_k=c_k*mu+s_k*nu; 

eps=tl 1 *s_kml -gamma*c_kml *c_km2; 

theta=-gamma* s_km2 ; 

p=(q+q_old*ul2-p 1 *eps-p2*theta)/mu_k; 
Dp=(c 1 *(q+q_old*u 1 2)+c2*(v+v_old*u 1 2)-Dp 1 *eps-Dp2 
*theta)/mu_k; 

morm_p=norm(r+omega*Dp); 

xp=x-omega*p; 

c_km2=c_kml; s_km2=s_kml; p2=pl; Dp2=Dpl; 

c_kml=c_k; s_kml=s_k; pl=p; Dpl=Dp; 
% end added lines 

rr=[rr;rnorm]; 

matvec=matvec+multDw; 

Matvec=[Matvec,matvec] ; 

counter++; 
end 
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